Mapping of Sequence Reads to the Reference Genomes    ◾    73

So far, we have two directories: “data” where the FASTQ files were downloaded and

“refgenome” where the FASTA sequence of the reference genome was downloaded and

indexed.

In the next step, we will use any of the alignment algorithms to map the reads to the

human reference genome. The “bwa mem” algorithm can be tried first as recommended

by the BWA.

1-BWA-MEM algorithm

In the following, the “bwa mem” command maps the reads in the FASTQ files to the

human reference genome and saves the output file, which contains the mapped reads in a

new directory called “bwa_sam”. Before running the commands, make sure that the work-

ing directory is just one level out of the two directories.

mkdir sam

bwa mem \

-t 4 \

refgenome/GRCh38.p13_ref.fna \

data/SRR769545_1.fastq.gz \

data/SRR769545_2.fastq.gz \

> sam/SRR769545_mem.sam 2> sam/SRR769545_mem.log

The “bwa mem” command has the capability of mapping reads of a length ranging between

70 bp and 1 Mbp. The BWA-MEM algorithm uses seeding alignments with maximal exact

matches (MEMs) and then it extends seeds with the affine-gap SW algorithm. We can run

“bwa mem” on the command line without any option to learn more about “bwa mem”

command. In the above, we use “-t 4” so that the program can use four parallel threads to

perform read mapping. Then, we provided the path of the FASTA file name of the reference

genome and the two FASTQ file names. The output of the aligned reads will be redirected

to a new file “SRR769545_mem.sam” using the Unix/Linux redirection symbol “>”. The

command execution comments are redirected to the log file “SRR769545_mem.log” using

“2>”, which is used in Unix/Linux to redirect stderr (standard error) to a text file. Both the

output files will be saved in the “sam” directory. The log file contains important informa-

tion about the comments and any standard error that occurs during the mapping process.

Always open the log file if the running fails. The log file will tell you if the failure is due to

the wrong syntax, wrong file path, or any other standard error. Even if the program has

finished running normally, it is a good practice to open the log file and look at the statistics

and comments.

cd sam

less SRR769545_mem.log

You can also display the SAM file produced by “bwa mem” with the following:

less -S SRR769545_mem.sam